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Abstract 


An extensive theoretical and qualitative literature stresses the promise of instructional practices 
and content aligned with minority students’ experiences. Ethnic studies courses provide an 
example of such “culturally relevant pedagogy” (CRP). Despite theoretical support, empirical 
evidence on the effectiveness of these courses is limited. We estimate the causal effects of an 
ethnic studies curriculum, using a “fuzzy” regression discontinuity design based on the fact that 
several schools assigned students with eighth-grade GPAs below a threshold to take the course. 
Assignment to this course increased ninth-grade attendance by 21 percentage points, GPA by 1.4 
grade points, and credits earned by 23. These surprisingly large effects suggest that CRP, when 
implemented in a high-fidelity context, can provide effective support to at-risk students. 
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The racial and ethnic gaps that exist across a variety of important student outcomes in the 
United States are both disturbingly large and stubbornly persistent. For example, data from the 
recently released 2015 National Assessment of Educational Progress (NAEP) indicate that, on 
average, the mathematics knowledge of eighth-grade Black and Hispanic students in public 
schools lags behind that of their White peers by an amount equivalent to roughly two to three full 
years of learning (i.e., 0.84 and 0.59 standard deviations, respectively).' Black and Hispanic 
students are also substantially overrepresented among students diagnosed with specific learning 
disabilities relative to their White peers (Aud, Fox, & KewalRamani, 2010). Furthermore, while 
roughly 14 percent of White students in public high schools fail to graduate on time, the 
corresponding dropout rates for Black and Hispanic students are roughly twice as large (Stetser 
& Stillwell, 2014). 

Awareness of such disparities is not new; concerns about unequal educational 
opportunities and outcomes have been documented over several decades in national policy 
reports, including the Coleman Report and A Nation at Risk (Coleman et al., 1966; National 
Commission on Excellence in Education, 1983). Recent work using nationally-representative 
datasets over several decades suggests that, while racial achievement gaps narrowed, they remain 
sizable (Reardon, 2011). The striking patterns identified by this work have motivated a broad 
array of aggressive federal, state, and local policies that have shaped the governance and 
operations of public schools over the last several decades. These contentious reforms have 
included new resources, different forms of school accountability, and choice (e.g., The 
Elementary and Secondary Education Act, its subsequent re-authorizations including No Child 
Left Behind, the Standards-based Education Reform in the 1990s, vouchers, and charters) as well 


as initiatives to promote effective teaching through performance-based compensation systems. 
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Over the same period, historians and social scientists advocated for teaching the histories 
of specific race-ethnic groups to combat the harmful effects of segregation and a neglect of 
diverse histories (Banks, 1993). One prominent example was the development of “ethnic studies” 
(hereafter, ES) programs of study, which expanded in the wake of the U.S. Civil Rights 
Movement. ES refer to interdisciplinary programs of study that focus on the experiences of racial 
and ethnic minorities with a particular emphasis on historical struggles and social movements. 
The development and expansion of ES contributed to the contemporaneous development of 
multicultural education and its influence on school pedagogy. 

More recently, a fast-growing (and largely qualitative) research literature in education has 
focused on classroom pedagogy and stressed the importance of “culturally relevant pedagogy” 
(CRP) as a compelling way to unlock the educational potential of historically marginalized 
students (e.g., Ladson-Billings, 1992b, 1994, 1995; Ladson-Billings & Tate, 1995). The 
fundamental theoretical argument for CRP is that instructional practices are substantially more 
effective when differentiated to align with the distinctive cultural priors that individual students 
experience outside of school, and when they also affirm both cultural identity and critical social 
engagement (e.g., Gay, 2010). Modern ES courses provide a particularly prominent example of 
CRP. Apart from the relevance of ES content for students who are racial and ethnic minorities, 
ES courses often incorporate other elements of CRP through their emphasis on cultural identities 
and conscious engagement with social and political issues (Banks, 1997, 2012; Cammarota & 
Romero, 2009; Sleeter, 2011; Yosso, 2002, 2005)." While some school districts are currently 
experiencing sustained political controversy over their use of ES curricula (e.g., Tucson), other 
major urban school districts (e.g., Los Angeles and San Francisco) have begun implementing 


new ES courses in hopes of supporting the academic achievement of their diverse student 
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populations. In September 2016, Governor Brown of California signed a bill that will require the 
state to develop a model ES curriculum for high schools and encourage every school district and 
charter school in the state to offer the course beginning with the 2020-21 school year (Wang, 
2016). 

Although ES courses are proliferating, the available quantitative evidence on the causal 
effects of ES courses (and, culturally relevant pedagogy, in general) on student outcomes is 
limited, particularly for larger-scale field settings. This study provides such evidence through 
examining the effects of a ninth-grade ES course piloted over several years in the San Francisco 
Unified School District (SFUSD). Specifically, using data on 1,405 students from five school- 
by-year cohorts, we examine the effects of ES participation for students on the margin of 
assignment to the ES course on several proximate academic outcomes (i.e., attendance, grade 
point average, and credits earned) that are highly relevant for high school persistence. Our 
research design identifies the causal effects of taking the ES course on key ninth-grade outcomes 
by leveraging an institutional feature that was unique to SFUSD. High school students in our 
study cohorts were assigned to take the ES course if they were identified as at-risk of dropping 
out (i.e., an eighth-grade GPA below 2.0). We estimate the effects of ES participation through a 
“regression discontinuity” (RD) design that effectively compares outcomes among students 
whose eighth-grade GPA placed them just below versus just above this threshold condition. RD 
designs such as this can credibly support causal inferences because they are based on the “as 
good as randomized” assignment to treatment that exists for students proximate to this threshold 
(D. S. Lee & Lemieux, 2010). 

We find that ES participation had large, positive effects on each of our student outcomes. 


Specifically, ES participation increased student attendance (i.e., reduced unexcused absences) by 
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21 percentage points, cumulative ninth-grade GPA by 1.4 grade points, and credits earned by 23 
credits." These GPA gains were larger for boys than for girls, and higher in math and science 
than in ELA. We find that these large effects are robust to a variety of model specifications as 
well as checks for possible confounds related to the treatment contrast we study (e.g., 
unobserved teacher effects, the possibly independent effects of an at-risk designation, “heaping” 
of the assignment variable). We also argue that these large effects are consistent with the 
hypothesis that participation in the course reduced the probability of dropping out in addition to 
possibly improving the performance of enrolled students. Overall, our findings indicate that a 
culturally relevant curriculum, implemented in a strongly supportive context, can be highly 
effective at improving outcomes among a diverse group of academically at-risk students. We 
also note that the effectiveness of this ES course may reflect other theoretical mechanisms (e.g., 
buffering students against “stereotype threat”) and that there are potentially serious challenges of 
successfully replicating and scaling up this curriculum. 
Cultural Relevance and Ethnic Studies in Theory and Practice 

Both academic and popular discussions have long emphasized the role that a 
community’s culture may play in amplifying or ameliorating achievement gaps. For example, an 
older and largely discredited literature from the 1960s (e.g., Bereiter & Engelmann, 1966; 
Deutsch, 1967; Hess & Shipman, 1965) suggested that achievement gaps reproduce themselves, 
in part, because racial and ethnic minorities enter school with a deficit of “cultural capital” (e.g., 
skills and dispositions related to the dominant culture) that could otherwise support student 
success. A more contemporary literature based on an influential article by Fordham and Ogbu 
(1986) has advanced the related argument that, in response to discrimination, minority 


communities develop an “oppositional peer culture” that effectively devalues educational effort 
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and success as “acting White.” Several qualitative studies have strongly disputed this cultural 
characterization (e.g., Horvat & Lewis, 2003; O’Connor, 1997). Moreover, quantitative studies 
(e.g., Ainsworth-Damell & Downey, 1998; Akerlof & Kranton, 2002; Cook & Ludwig, 1997; 
Downey & Ainsworth-Darnell, 2002; Tyson, Darity, & Castellino, 2005) have found little 
evidence to support the conjectured existence of an “oppositional” culture that contributes to 
achievement gaps. 

Another body of qualitative studies has shifted the focus to evidence that school and 
classroom practices are frequently misaligned with the cultural priors and out-of-school 
experiences of minority students (Banks, 1991; Gay, 1988; Ladson-Billings, 1992a; Valenzuela, 
1999). Specifically, several anthropological and sociolinguistic studies (e.g., Au & Jordan, 1981; 
Mohatt & Erickson, 1981) have provided evidence that teachers who are highly effective with 
minority students adopt culturally “appropriate” or “congruent” methods to engage their students 
(e.g., through their use of language and the design of classroom activities). In an influential body 
of work that drew, in part, on this earlier tradition, Ladson-Billings (1992b, 1994, 1995) 
examined and advocated for the practical and theoretical relevance of “culturally relevant 
pedagogy” (CRP). One key element of CRP is the use of valid cultural referents in teacher 
practice. Ladson-Billings (1992b) argues that CRP does more than “fit” school culture to student 
culture; it also seeks to “use” student culture as a basis for classroom practice and to enhance and 
affirm cultural competence, academic development, and social and political awareness. To 
support these developments in students, Ladson-Billings identified three domains of CRP that 
are embodied by culturally relevant teachers: their conceptions of self and others, how they 
structure social relations, and what conceptions of knowledge they hold. Culturally relevant 


teachers conceive of all students as capable of academic success, contribute to the community, 
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and draw out knowledge from their students. They develop a collaborative community of 
learners in their classroom, connecting with students and maintaining fluid relationships. They 
also scaffold and bridge learning for students, and work with students to critically, co-construct 
knowledge. 

Interestingly, independent disciplinary traditions can provide alternative theoretical 
frames for situating how CRP might be effective in improving the academic performance of 
minority students. For example, the social-psychological literature on “stereotype threat” 
suggests that minority students underperform in highly evaluative settings such as classrooms 
because of the anxiety created by the expectation of being viewed through the lens of a negative 
stereotype (Steele & Aronson, 1995). Several field-based randomized trials of interventions that 
“buffer” students against stereotype threat have shown promise in reducing achievement gaps, 
though their efficacy appears to be context-dependent (Aaronson & Dee, 2012; Dee, 2015; 
Yeager & Walton, 2011). Interestingly, the active ingredients in these stereotype-threat buffers 
(e.g., forewarning about stereotypes, values affirmation, external attribution for experiencing 
challenges, and growth mindsets) closely parallel the defining elements of CRP. The theoretical 
logic for CRP can also be understood in a microeconomics framework in which students have 
imperfect information about their own suitability for academic pursuits. Benabou and Tirole 
(2003) argue that, in these circumstances, individuals adopt a “looking-glass” perspective in 
which they come to understand their own place in the world based, in part, on the cues they 
receive about themselves from others (e.g., schools and teachers). In such a setting, CRP may be 
effective because both cultural congruence and an emphasis on cultural affirmation and integrity 


create positive signals about belongingness in school. 
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As commonly conceived and implemented, ES courses provide a prominent and 
controversial example of CRP. ES courses focus on the experiences, perspectives, and histories 
of traditionally underrepresented ethnic or racial groups and have several specific features. They 
are typically organized around the principal that CRP better engages underrepresented students 
and meets their needs by drawing on their cultural competencies to promote academic success. 
That is, ES courses are theorized to positively affect student outcomes through the creation of a 
relevant and meaningful curriculum that affirms students’ identities, draws from their funds of 
knowledge, and builds students’ critical intellectualism (Banks, 2012; Cammarota & Romero, 
2009; Giroux & Simon, 1989; Sleeter, 2011; Tintiangco-Cubales et al., 2015). To support this 
type of curriculum, ES courses often adopt alternative organizational and pedagogical structures 
following central lessons from CRP. For example, many ES courses utilize a classroom structure 
in which teachers promote engagement by structuring collaborative, equitable, reciprocal 
relationships between themselves and students, scaffolding student learning and drawing upon 
students as a source of knowledge (Duncan-Andrade & Morrell, 2008; Ladson-Billings 1995; 
Sleeter, 2011; Tintiangco-Cubales et al., 2015). In addition to content that engages with students’ 
cultural identities, and a student-focused classroom structure, ES courses also draw from critical 
pedagogies, using an educational praxis to provide students with tools for identifying, reflecting 
upon, critiquing, and acting against systemic racism and other forms of oppression (Freire, 2000; 
Giroux & Simon, 1989; Sleeter, 2011; Sleeter & Bernal, 2004). Recent examples of ES 
coursework guide students in exploring their own identities and engaging with their community, 
often incorporating assignments that require repeated engagement with community and family 
members and some type of social activism (Ladson-Billings 1995; Tintiangco-Cubales et al., 


2015). Proponents of ES also stress the positive effect that these courses will have on standard 


THE CAUSAL EFFECTS OF CULTURAL RELEVANCE - 9 


educational outcomes, such as students’ grades, test scores, and school completion, as well as 
student attitudes and behavior (Cabrera, Milem, Jaquette, & Marx, 2014; Kisker et al. 2012; 
Matthews & Smith, 1994; Tintiangco-Cubales et al., 2015). 

The first formal ES course was created at San Francisco State University in 1968, 
growing out of the civil-rights and anti-war movements. Some argue that ES as an idea has a 
longer history tracing back to Freedom Schools, Black independent schools, and tribal schools, 
among others, and builds on the work of African-American luminaries, including Williams, 
DuBois, and Woodson (Banks, 1993; Begay et al., 1995; C. D. Lee, 1992; Sleeter, 2011). Since 
their formalization at the post-secondary level, ES programs and curricula have spread to 
universities across the country, but are still relatively uncommon in secondary schools (Hurtado, 
Engberg, Ponjuan, & Landreman, 2002). Recently, several school districts have or are 
considering adopting ES courses as graduation requirements (Gilbertson, 2014; Tucker, 2014). 

In spite of growing interest in ES, the expansion and implementation of ES programs is 
often highly contentious. Critics often characterize ES programs as divisive, non-academic, and 
detrimental to students of color because they are substituting courses that promote the 
development of ethnic pride in place of the development of mainstream academic skills (Sleeter, 
2011). When schools, colleges, and universities offer such courses or programs of study, they 
often become a contentious political flashpoint. For example, the school district in Tucson, 
Arizona, which had offered courses in Mexican-American studies, was recently found in 
violation of a new state law preventing the teaching of such courses as they “promote the 


99 66 


overthrow of the United States government,” “promote resentment toward a race or class of 
people” and “advocate ethnic solidarity instead of the treatment of pupils as individuals,” 


(formerly Arizona HB 2281, 2010, Arizona Revised Statute § 15-112, 2010) and subsequently 
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eliminated this programming under threat of losing state funding (Billeaud, 2011). Student 
protests of the school board meeting debating this policy and the ensuing controversy were 
covered by a diverse segment of the national media, including Fox News, The Daily Show with 
Jon Stewart, and the New York Times (Cabrera, Meza, Romero, & Rodriguez, 2013). 

At the same time, a growing number of other districts have expanded or are considering 
expanding their ES offerings. For example, the Los Angeles Unified School District, the El 
Rancho Unified School District, the Montebello Unified School District, the Sacramento Unified 
School District, the San Diego Unified School District, the Oakland Unified School District, and 
the Coachella Valley Unified School District have all recently included ES courses in their high 
school course offerings or graduation requirements and several others are considering motions to 
do so (Ethnicstudiesnow.com, 2016; Gilbertson, 2014; Wang, 2015; Tucker, 2014). Multiple 
legislative efforts, including a bill requiring all California high schools to offer ES courses were 
vetoed by Governor Jerry Brown (Ceasar, 2015; Clark, 2015; Tucker, 2014). However, in 
September of 2016, Governor Brown signed a bill mandating the development of a model ES 
program for California’s public schools to utilize in creating their own programs (Wang, 2016). 
The Texas State Board of Education also recently approved legislation allowing school districts 
to develop courses on Mexican-American studies (Isensee, 2014). In addition, the Berkeley 
Unified School District has offered a freshman ES course for over 20 years, requiring it for high 
school graduation during nearly all of this time (Artz, 2003; Levin, 2009; Noguera, 1994; Rubin 
et al., 2006; Veale, 2015). As we describe in the next section, the motivating context for this 
study is that the San Francisco Unified School District (SFUSD) was considering scaling-up 


access to a pilot ES curriculum and, possibly, designating it a graduation requirement. 
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While the expansion of ES courses illustrates both their appeal and concerns, the 
quantitative evidence on their effects is relatively limited. Furthermore, the evidence that is 
available relies on research designs that cannot necessarily support credible causal inference.’ 
For example, a small-scale descriptive study by Cammarota (2007) focused on the “Social 
Justice Education Project” (SJEP), a “sub-curriculum” fielded among 17 at-risk Latina/o students 
in a Tucson high school over four semesters between 2003 and 2005. Cammarota (2007) reports 
that these students were successful both in completing high school and in engaging with 
advanced courses. A study by Lewis, Sullivan, and Bybee (2006) examined the effects of an 
“Emancipatory Education” course fielded over one semester among n=65 eighth-grade students 
in an urban, predominantly Black school. They randomly assigned one of the two participating 
classes to receive this intervention and found positive effects on communal orientation, school 
connectedness, and achievement motivation, though not on achievement itself. The availability 
of only two assignment units within the same school (and the lack of evidence on balance at 
baseline) makes it difficult to differentiate the true effects of the course from the effects of other 
unobserved traits that may have differed across these two classrooms or spillovers of content and 
pedagogy between the two classrooms. 

Two other studies have relied on regression analyses of administrative data from the 
larger-scale implementation of ES in Tucson, Arizona. First, a brief report from the Arizona 
Department of Education (Franciosi 2009) compared the test performance of Hispanic students 
in Tucson who took one or more ES course in the 2008-09 school year with Hispanic students 
statewide in regressions that controlled for other student traits (e.g., prior performance, mobility, 
and English learner status). This analysis found no evidence that course participation improved 


student performance. A more recent study by Cabrera, Milem, Jaquette, & Marx (2014) relied on 
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administrative data from roughly 8,400 students over four cohorts (i.e., the graduating classes of 
2008-2011) to examine the Mexican-American studies (MAS) program offered in four schools in 
Tucson.” In regression analyses that control for student demographic characteristics 
(race/ethnicity, gender, free/reduced price lunch eligibility, Census block median income, ELL, 
Special Ed, and GATE status, number of school transfers), prior academic achievement (ninth- 
and tenth-grade weighted GPA, tenth-grade standardized test scores), and school-level context 
(school fixed effects), they find evidence that MAS participation improved student outcomes." 
In particular, participation in MAS was associated with an increase in the probability of 
graduation of 9.5 percent across all cohorts. Among the subsample of students who initially 
failed the exit exam, MAS participation was associated with a 6.6 percent increase in the 
probability of passing the all three exit exams (the reading, writing, and math AIMS tests), on 
average across all cohorts.” 

A central challenge to these empirical studies is that participation in the MAS program 
was voluntary. Thus, regression-adjusted comparisons among those who did and did not enroll 
may suffer from omitted variable biases of an uncertain direction. For example, if students who 
have a latent and unobserved capacity for school engagement are more likely to enroll in these 
courses, naive regressions may overstate the program’s benefits. In contrast, if at-risk students 
are more likely to be enrolled in MAS courses, their effect is likely to be understated. Cabrera et 
al. (2014, page 1094) discuss these methodological challenges and acknowledge the limitations 
of their study noting, “our results may suffer from omitted variable bias and should not be 
considered true causal effects.” 

In sum, the theoretical arguments and public enthusiasm for ES curricula have not been 


matched by convincing quantitative evidence on their efficacy. Our study contributes to this gap 


THE CAUSAL EFFECTS OF CULTURAL RELEVANCE - 13 


in the literature by employing a research design that can credibly support a strong causal warrant. 
Specifically, we rely on an explicit student assignment rule to identify the causal effects of a 
year-long ES course for students on the margin of participation in a regression discontinuity 
(RD) design. Our study is also unique in that it focuses on a mature, developed course situated 
within a novel setting (i.e., high schools in the San Francisco Unified School District). We 
describe our study context and research design in more detail below. 

Ethnic Studies in the San Francisco Unified School District (SFUSD) 

The genesis of the SFUSD ES curriculum was in 2007 when the District’s Board of 
Education Curriculum Committee urged the district to create a high school ES curriculum. The 
District’s Office of Learning Support and Equity, in collaboration with faculty from the College 
of Ethnic Studies at San Francisco State University (SFSU), subsequently initiated the 
curriculum design. Specifically, ten SFUSD social studies teachers formed the “Ethnic Studies 
Curriculum Collective” with SFSU faculty support. This group created a course framework 
drawing from ES curricula used in other districts and post-secondary programs across the 
country during the 2007-08 school year. Over the next two years, the Collective created lesson 
plans, piloted the lessons in three high schools, and met twice a month for lesson critique and 
development (SFUSD Ethnic Studies Curriculum Collective, 2012). 

On February 23, 2010, the SFUSD school board unanimously approved a resolution to 
implement an ES pilot program in SFUSD high schools, explicitly referencing the promise of ES 
courses to contribute to closing achievement gaps. Five high schools participated in the pilot, 
offering a year-long, ninth-grade ES course from the 2010-11 to 2012-13 school years. The 
program continued into the 2013-14 school year. In December of 2014 (i.e., after our study 


window), the school board voted to expand the program to be offered at all 19 of San Francisco’s 
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high schools. It is also being considered as a ninth-grade graduation requirement (Dudnick, 
2014). 

The design of SFUSD’s ES course stressed the use of CRP as a way to engage with 
students that had previously felt marginalized by the traditional curriculum. Units focused on 
themes of social justice, discrimination, stereotypes, and social movements from U.S. history 
spanning the late 18" century until the 1970s. Unlike a curriculum that is specific to the history 
of one particular race/ethnic group (e.g., Tucson’s Mexican American Studies program), the 
SFUSD curriculum incorporated elements of histories and political struggles from multiple 
race/ethnic groups, many of which are not traditionally represented in U.S. social studies content. 
These units included examinations of the genocide of American Indians in California, portrayals 
of Asians, Latinos, and African Americans in the media, community resistance in historical 
Chinese and Latino neighborhoods in California, labor organizing during the Great Depression 
and World War II among African Americans and Filipino Americans, and social movements and 
educational reforms contributing to and stemming from the Civil Rights Movement. 

The course also encouraged students “to explore their individual identity, their family 
history, and their community history” and required students to design and implement service- 
learning projects based on their study of their local community. The designers of this curriculum 
hoped that these lessons and projects would increase students’ commitment to social justice and 
improve self-esteem. In addition to the civic and psychological goals of the ES program, the 
program’s stated intent was to close achievement gaps and reduce dropout rates (Office of 
Learning Support and Equity/Humanities, Academics and Professional Development, 2009; 


SFUSD Ethnic Studies Curriculum Collective, 2012). 
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An additional component of the course, which cannot be disentangled from the content 
but is itself an integral piece of CRP, is the pedagogical style and method utilized in the ES 
courses. Drawing from CRP, ES teachers used methods designed to build upon and honor 
students’ cultural assets, experiences and perspectives, develop their critical consciousness, and 
create authentic, caring academic environments. This type of teaching was intended to ensure an 
environment that valued and encouraged the success of the students both within and outside of 
the classroom. ES teachers worked to help students bridge their academic, home, and community 
environments, creating assignments that connected home and school, and required students to 
engage with their communities (Tintiago-Cubales et al., 2014). Through these methods, ES 
classrooms became spaces for reflexive, critical, empowered conversation in which students 
helped to co-create the content. 

While the ES curriculum was under development for several years and across several 
different high schools in San Francisco, the assignment of students varied. Some of the pilot 
schools chose to offer the ES course to all incoming ninth graders, while other schools used the 
program as an intervention for students identified as at-risk for academic failure through an 
early-warning system. The early-warning indicator (EWI) flagged students who, in eighth grade, 
had either an attendance rate below 87.5 percent or a GPA (excluding physical education) below 
2.0. Prior research had shown that, in SFUSD, these binary variables were highly predictive of 
dropping out of school. In our data, very few students had an attendance rate below the 87.5- 
percent threshold so the relevant “assignment variable” in our RD design is the eighth-grade 
GPA.* Students whose eighth-grade GPA was below 2.0 were encouraged but not compelled to 
take the ES course. This partial compliance implies that our RD design is “fuzzy” and that there 


may be external-validity caveats to our inferences if the effect of taking the ES course is 
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heterogeneous and that the estimand may not be valid for all students at the discontinuity 
(Imbens and Angrist 1994). We take up this and other related issues after describing our data and 
methods below. 
Data 
We examine the effect of SFUSD’s year-long ninth-grade ES course on student 

outcomes, primarily using data from three of the five high schools that piloted the curriculum. 
These three high schools assigned only some ninth-grade students to the ES course, while two 
other schools chose to offer the ES course to all ninth-grade students. These schools typically 
offered two and four sections of the course in each year, although the course was not offered in 
all schools in every year. Our primary study sample draws from five unique school-year cohorts 
in these three high schools. In these five cohorts, enrollment in ES was encouraged, but not 
required, for students whose eighth-grade GPA was below 2.0. Students identified by the early- 
warning indicators (EWI) as at-risk of high school failure were automatically enrolled in the ES 
course when they received their course schedule at the start of their ninth-grade year.* Students 
could opt out of the course after consulting with their academic counselor, but needed to actively 
select out of the course to do so.*' One school used this rule over three years (i.e., AY 2011-12 
though AY 2013-14) while two other schools used this in AY 2011-12 only. Critically, only four 
unique teachers taught the ES courses in these schools and years. We discuss, along with our 
other robustness checks, evidence indicating that our results are not simply due to effects unique 
to the effectiveness of these teachers. *" 

Our initial sample consists of ninth graders in these five school-year cohorts. Among 
these cohorts, we exclude those who are missing our assignment variable: a recorded eighth- 


grade GPA (n = 226). We also exclude 128 students with eighth-grade GPAs that are distant 
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from the threshold and clustered at a perfect 4.0 GPA. Similarly, we exclude a small number 
(n=27) of additional students with extremely low eighth-grade GPAs (i.e., less than 1.25).*" 
These sample edits imply a final “intent-to-treat” (ITT) sample of 1,405 students. Our data on 
these students include several measures of baseline (eighth grade) traits, all of which we include 
as controls in our regression models. These include binary indicators for gender and for whether 
the student was Black, Hispanic, or Asian*’ (with White serving as the reference category). We 
also include eighth-grade data on whether the student was in special education, ever suspended, 
or identified as an English Language Learner (ELL). In addition, we utilize data on each 
student’s attendance rate in eighth-grade, the value of their assignment variable (i.e., eighth- 
grade GPA exclusive of P.E. and centered on 2.0), and a binary indicator for our “intent-to-treat” 
(ITT) variable (i.e., an eighth-grade GPA less than 2.0). 

Table 1 presents descriptive statistics on these students for each of the baseline control 
variables. Interestingly, 60 percent of these students are of Asian descent and 23 percent are 
Hispanic. Only 6 percent of these students are Black. Eighteen percent of these students are 
identified as ELLs and 12 percent have special education status. Among the cohorts in our 
sample, only 42 percent are female. This is due in part to the fact that there are fewer female 
students than male in the district overall (48 percent across all SFUSD schools), but particularly 
because female students are higher-achieving than our sample (recall that we exclude students 
who receive a perfect 4.0, which drives most of the difference in female representation between 
the full district and our sample). Thirteen percent of students enrolled in the ES course and eight 
percent of the sample had an eighth-grade GPA below 2.0 (i.e., an intent-to-treat as taking the ES 
course).*” None of the students in our analysis sample are missing values for any of the baseline 


control variables. 
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We examine three dependent variables in our analyses, ninth-grade attendance rates 
(which the district refers to as instructional time), ninth-grade GPA, and ninth-grade credits 
eared. The last two measures are defined exclusive of all social studies courses (i.e., like the ES 
course) and physical education.*” We also control for eighth-grade attendance and GPA in our 
models. While the average attendance rate increases slightly between eighth and ninth-grade 
(from 96.32 percent to 96.69 percent), GPA declines substantially during this important 
transition. The mean eighth-grade GPA is just above a 3.0 (a “B” on the four-point scale), by the 
end of ninth grade, the average GPA is 2.65 (a “C” on the four-point scale). Following the 
reporting conventions of the district for the purposes of grade promotion and retention, we use 
un-weighted eighth- and ninth-grade GPAs that do not assign additional credits for honors or AP 
courses. 

We measure these outcomes for all students observed at baseline in our intent-to-treat 
sample regardless of whether they completed ninth grade.*”" So, we view the variation in these 
measures as reflecting both the academic progress of enrolled students and the probability a 
student has dropped out of school. For students to advance from ninth to tenth grade, they must 
complete at least 55 credits. Because we exclude physical education (which would account for 
10 credits) and social studies (which would account for an additional 10), students should 
complete at least 35 credits by our measure in order to advance to tenth grade. In our sample, we 
find that 7.3 percent of students have fewer than 35 credits at the end of ninth grade (i.e., 
suggesting they dropped out or were required to repeat ninth grade). Furthermore, the students at 
risk of dropping out tend to be concentrated among those encouraged to take ES. Our results also 
appear to reflect changes in the performance of enrolled students. In particular, we find virtually 


identical results to those we report below when we rely only on GPA from the first semester. 
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Regression Discontinuity (RD) Design 

Our research design effectively compares those who were just eligible for assignment to 
the ES course (i.e., eighth-grade GPA below 2.0) to those who were just ineligible for this 
assignment (i.e., eighth-grade GPA at 2.0 or above). Specifically, we use a regression 
discontinuity (RD) design, which can provide causal inferences that are “as good as random 
assignment” (Lee and Lemieux 2010) in settings like this. An RD design asks whether, 
conditional on a students’ eighth-grade GPA, student outcomes “jump” at the threshold that 
defined treatment eligibility (i.e., assignment to ES). The RD design can be implemented by 
estimating reduced-form equations of the following general form: 


Y= at MU(G,,, <0)+ f(G,,,) + AX, 


St 


+ Net + E ist 


ist 
where Yis: is a student outcome (e.g., GPA) for ninth grader i in school s in year t. The variable, 
Gist, is the “assignment variable” in this RD design: eighth-grade GPA centered on 2.0. The 
parameter of interest, /, identifies the jump in outcomes when eighth-grade GPA is below 2.0, 
conditional on f(Gist), a smooth function of the assignment variable. We specify f(Gis:) as linear 
but allow for different slopes above and below the threshold. We also explore flexibly non- 
parametric specifications (i.e., local linear regressions). The variable, Xis:, refers to student-level 
controls and 7s refers to fixed effects unique to each year at a particular school. We use robust 
standard errors in all models shown below.*”" 

In Table 2, we present the RD results from examining whether actually taking the ES 
course does indeed jump at the 2.0 threshold. We find robust evidence that the likelihood of 
taking the ES course jumps roughly 27 percentage points at the threshold. Figure 1 illustrates this 
finding graphically by showing the probability of taking ES as a function of eighth-grade GPA. 


This figure organizes the data in bins of width 0.1 defined by eighth-grade GPA. The top panel 
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uses the full sample while the bottom panel uses data within a 0.7 GPA bandwidth of the 
threshold. These figures consistently illustrate the jump in treatment status at the threshold. They 
also underscore that, as is common in RD and experimental settings, we have partial compliance 
with the intent-to-treat implied by an eighth-grade GPA below 2.0. Roughly 15 percent of 
students with eighth-grade GPAs of 2.0 or slightly higher took ES while just over 50 percent of 
students below the threshold did so. This partial compliance does not confound the internal 
validity of the RD design because the identifying variation is based on eighth-grade GPA rather 
than the decision to take the course. In other words, our reduced-form estimates identify the 
effect of being assigned to take the ES course (i.e., the “intent-to-treat” effect) rather than the 
effect of taking the course. To recover the estimated effect of actually taking the ES course (i.e., 
the “treatment-on-the-treated” effect) we divide our reduced-form estimates by the 
corresponding treatment uptake at the threshold (i.e., dividing our estimates by roughly 0.27). 
We calculate these instrumental variables (IV) two-stage least squares estimates (2SLS) 
following guidelines described in Angrist and Pischke (2014).** 

The fundamental treatment contrast leveraged in our study is among students eligible for 
assignment to the ES course (i.e., those with eighth-grade GPAs below the 2.0 threshold) and 
those who were not. There are several potential threats to the validity of this contrast that we 
outline here before turning to our main results and to robustness checks. For example, to avoid 
any confounds related to different grading and attendance standards across the alternative 
courses students around this threshold took, we define our GPA and credits-earned measures 
excluding data from the ES course and all other social studies courses. A remaining concern is 
that taking ES may imply that a student takes different courses in other subject areas. In practice, 


the majority of students who do not take ES enroll in either an alternate social studies course or a 
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variety of other elective courses including art, music, foreign languages, and study-skills courses, 
among others. Although elective choice varies, we found that virtually all students were initially 
enrolled in math, ELA, and science courses and that course selection in these subject areas did 
not differ for students around the 2.0 GPA threshold.* We also present results using GPA 
measures specific to each of these three subjects. 

The strong causal warrant of the RD design relies critically on the maintained assumption 
that students’ locations just above and below the 2.0 threshold are conditionally random. One 
compelling way to check this key assumption is akin to examining covariate balance in a 
randomized experiment (Schochet et al., 2010). That is, we examine whether outcome-relevant 
student traits “jump” at the threshold which defines the intent to treat. In Table 3, we present the 
key results from auxiliary regressions that make these comparisons. Specifically, we present the 
results from RD regressions where observed student characteristics are the dependent variables. 
The estimated jumps in these variables at the 2.0 threshold are consistently small and statistically 
insignificant. The balance of observed student covariates around this threshold is consistent with 
the causal claim of the RD design. However, as with randomized experiments, we cannot be 
entirely certain that this mechanism balances the unobserved determinants of student outcomes. 

A related concern in RD designs is whether students differentially manipulate their 
eighth-grade GPA to place themselves on one side of the 2.0 threshold, potentially in response to 
something that is not observed by the researchers. In general, efforts to raise the value of a 
forcing variable do not invalidate an RD design (Lee and Lemieux 2010). However, if 
individuals can systematically manipulate their position relative to the threshold, it can impugn 
an RD’s internal validity. This is a unique concern in this context because eighth-grade GPA 


scores “heap” at a value of 2.0 and other integer and half-integer values (see Figure 5a). Students 
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who earn an eighth-grade GPA of 2.0 may differ from those just below this value in unobserved 
ways that are relevant to eighth-grade outcomes. The covariate balance at the threshold suggests 
that this is not an internal-validity threat. In addition, we report results based on samples where 
we eliminated heaped observations. We also see (Figure 5b) that, when we eliminate these heaps, 
the distribution of observations is smooth at the threshold (McCrary, 2008). 

Two other internal-validity concerns are unique to our study context. One is that our RD 
contrast may also identify any effects related to being flagged by an early-warning indicator. One 
way we examine this concern is to estimate our basic RD design using data from the other San 
Francisco high schools that did not offer ES over this period. If our RD design is valid, we 
expect to find null results at the GPA threshold in these schools. In contrast, if early-warning 
status had independent effects, we would expect to find evidence in these schools. A second 
concern is that our RD framework may identify the effect of the four unique teachers in our 
study sample rather than the effect of the course per se. We investigate this issue by examining 
the comparative effectiveness of these teachers in the other courses they taught. We discuss these 
and other critical robustness checks as we outline our results below. 

Main Results 

Table 4 presents the main RD results examining the effects of ES eligibility on ninth- 
grade attendance, GPA, and credits earned for students at the threshold. The baseline 
specification (i.e., the first column for each of the three outcomes) controls for the variable of 
interest (i.e., a binary indicator for whether the student had an eighth-grade GPA below 2.0), 
eighth-grade GPA, and a linear spline that allows this assignment variable to have distinct effects 
above and below the threshold. The subsequent specifications introduce controls for gender, 


race/ethnicity, and eighth-grade special education and ELL designations, eighth-grade 
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attendance, and whether the student was ever suspended in eighth grade. These saturated 
specifications yield largely similar results, although the magnitude of the point estimates is 
reduced somewhat. Results from the most parsimonious to the most inclusive specifications 
consistently indicate that students with eighth-grade GPAs at the 2.0 threshold saw statistically 
significant improvements on all three ninth-grade academic outcomes. Drawing from the least- 
restrictive model, we find robust evidence that attendance jumped by 5.6 percentage points for 
students at the 2.0 threshold, GPA increased by 0.39 points, and credits earned increased by 6.3 
credits. 

Figures 2, 3, and 4 provide graphical illustrations of these RD results. Figure 2 plots 
students’ eighth-grade GPA scores by their ninth-grade attendance, with a line indicating the 2.0 
GPA cutoff. Figure 3 plots the relationship between eighth-grade GPA and ninth-grade GPA 
(excluding social studies and P.E.). Figure 4 plots the relationship between eighth-grade GPA 
and ninth-grade credits earned. Each of the figures shows a discontinuity at the 2.0 threshold, 
echoing the regression results shown in Table 4. 

The instrumental-variable (IV) estimates implied by these results indicate that taking ES 
increased attendance by 21 percentage points, GPA by 1.4 grade points, and credits earned by 23 
credits (or roughly four courses). We calculate these estimated effects of taking ES by inflating 
the effects of ES eligibility on academic outcomes (Table 4) by the effect of ES eligibility on ES 
take-up, following Angrist and Pischke (2014). This amounts to dividing the reduced-form 
effects in Table 4, Columns 3, 6, and 9, by roughly 0.273, the coefficient reported in Table 2, 
Column 3, which represents the jump in ES uptake for students below versus above the 2.0 
threshold (or multiplying by the inverse, approximately 3.7). For example, we obtain the effect 


of taking ES on GPA by dividing the coefficient for the jump in GPA at the 2.0 threshold shown 
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in Table 4, Column 6 (i.e., 0.387) by 0.273 (i.e., the change ES uptake at the threshold). This 
estimate indicates that students who took ES had an average end-of-year GPA that was 1.4 grade 
points higher than students who did not take the ES course. 

These effect sizes (e.g., roughly 1.5 times the size of the corresponding standard 
deviations in Table 1 for the GPA result) are quite large for interventions situated in field 
settings. As a result, several considerations should be noted. First and foremost, because we 
define these outcome measures for all students observed at baseline, some of these striking gains 
are likely to reflect reductions in dropping out as well as gains in the performance of enrolled 
students. Second, RD estimates like ours are effectively defined for students close to the 2.0 
GPA threshold. These tend to be students who are at considerable academic risk, so larger gains 
in academic performance are possible. We take up such issues of treatment effect heterogeneity 
after first exploring the robustness of our main findings. 

Robustness Checks 

Given the consistent, large findings across a variety of ninth-grade outcomes, we next 
tur to examining the robustness of the apparent effects associated with the eighth-grade 2.0 
GPA discontinuity. One possible confounding explanation for these findings is that they reflect 
the effects of the early-warning indicator (EWI) rather than the ES course. In other words, 
students might be receiving other services and interventions as a result of the EWI identification 
and this designation or these services might be driving changes in student outcomes rather than 
ES. To examine this concern, we estimated the same RD specifications using similarly 
constructed data from SFSUD high schools that did not offer an ES course. We present these 
results in Table 5. The small and statistically insignificant coefficients for each specification and 


for each of the three outcomes (i.e., there are no jumps at the 2.0 threshold in these schools) 
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indicate that EWI did not have an empirically meaningful effect on ninth-grade outcomes. These 
null results are consistent with the hypothesis that the Table 4 results reflect the effects of taking 
ES rather than the effects of an EWI designation. 

An additional concern relates to the fact that student grades are reported in even grade 
points, leading to large clusters of students with GPAs at even-integer or half-integer GPA 
values (e.g., 3.0 and 3.5 rather than 2.99). As has been shown in other work using regression 
discontinuities to estimate causal effects, results can be biased by this heaping of the assignment 
variable (Barreca, Guldi, Lindo, & Waddell, 2011; Lee & Card, 2008). We present several 
robustness specifications in Table 6 to examine whether our results are being driven by the 
preponderance of even and half-integer eighth-grade GPAs by excluding students with several 
specific values. In these “donut RDs” we first exclude students with eighth-grade GPAs of 2.0 
exactly. In a second version, we exclude students with any whole- or half-integer value for their 
eighth-grade GPA. For each of the ninth-grade academic outcomes, the point estimates presented 
in Table 6 are from individual regressions for the variable eighth-grade GPA is less than 2.0, 
akin to the point estimates shown in Table 4 from models including student controls, with the 
first row replicating these estimates exactly. 

The results in Table 6 show that our inferences are robust in specifications that exclude 
students whose eighth-grade GPA fell on the heaped values of 2.0 as well other integer and half- 
integer values. Each of the coefficients for all three of the ninth-grade academic outcomes is 
statistically significant at the 5 percent level and the magnitude of the coefficients is fairly 
consistent whether or not the students with GPAs of 2.0 or any integer or half-integer value are 


included in the sample. 
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Table 7 presents another important robustness check based on restricting the estimation 
sample to observations in increasingly tight bandwidths around the threshold for both the first- 
stage and reduced-form effects. These results provide evidence about whether the estimates are 
biased due to functional-form assumptions or are unduly influenced by observations that are far 
from the 2.0 GPA threshold. The results in Table 7 indicate that both the first-stage and reduced- 
form estimates are robust as the sample shrinks with each of the progressively tighter 
bandwidths, including a bandwidth that is within half of a grade point from the 2.0 threshold. If 
anything, the first-stage and reduced-form estimates become larger as the bandwidth tightens. 

Table 8 presents another robustness check based on simultaneously estimating jumps at 
the GPA threshold that actually influenced assignment to the ES course, 2.0, and at other 
“placebo” thresholds that have no relevance. We examine six placebo thresholds at each quarter- 
integer interval between GPAs of 1.0 and 3.0. Across both the first-stage and reduced-form 
estimates, the only statistically significant effects are observed at the 2.0 threshold, with one 
exception. Students at the 2.25 GPA threshold, just below the cutoff 2.25 cutoff, earn 
significantly fewer ninth-grade credits than students on the other side of this cutoff. With this 
exception, the nearly universal lack of statistically significant effects at these false thresholds is 
consistent with the absence of specification error. 

A final robustness check stems from the particular implementation of the ES curriculum 
in SFUSD. While ES was piloted at five high schools over several years, assignment to ES was 
based on the EWI in only five school-year cohorts at three schools. In each of these school-year 
cohorts, only one teacher taught ES, leaving us with a total of four unique teachers during our 
study window. This raises the possibility that the effects we observe are the result of effects 


unique to these teachers rather than the ES curriculum itself. To investigate this concern, we 
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examined the effectiveness of ES teachers relative to their peers, when teaching courses other 
than ES. We began by identifying all of the non-ES courses taught by our four ES teachers in 
any of the study years and then identified all of the other teachers of those same courses. The 
majority of these courses were social studies courses, such as U.S. and world history, but the list 
also included some college counseling and homeroom-type courses, which we chose to exclude 
from the analysis. We focused on students in these social studies courses who had not taken ES. 
We then recovered teacher fixed-effect estimates from regression models predicting each of our 
ninth-grade student outcomes (ninth-grade overall and subject-specific GPA, credits earned, and 
attendance), conditional on eighth-grade student controls and school by year fixed effects. For 
each of our outcomes, we examined the relative rankings of the teacher fixed effects to determine 
if the ES teachers were over-represented among teachers who had the largest fixed-effects 
estimates. Across each of the outcome measures, we found the ES teachers to be spread 
throughout the distribution of teacher fixed-effect estimates.’ Wilcoxon rank-sum tests further 
suggest that the fixed effects of ES teachers are not differently distributed from those of non-ES 
teachers in the same subjects. Of the four ES teachers, one fairly consistently had the largest 
fixed-effect estimate. To ensure that this generally more effective teacher was not driving our 
results, we re-estimated our key ES results without this teacher. Doing so did not qualitatively 
alter the previously reported findings. 
Treatment-Effect Heterogeneity 

Our main results may obscure several forms of treatment-effect heterogeneity that are 
worth noting and exploring. For example, one well-known caveat about external validity 
involves the “localness” of RD estimates. That is, because our research design leverages the 


targeting of ES courses to at-risk students, our resulting estimates do not speak to the effects 
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these courses might have on students with high performance in eighth grade.**" Second, the 
effect of taking the ES course could conceivably vary across students with different demographic 
traits. In Table 9, we present evidence on this issue by showing the first-stage and reduced-form 
estimates in samples defined by gender and race/ethnicity. The point estimates show that there 
are consistently positive effects across male, female, Asian and Hispanic groups of students. 
While generally positive overall, the improved outcomes are particularly concentrated among 
boys and statistically insignificant for girls. For Hispanic students, the estimated effects are 
consistently large and statistically significant across all ninth-grade outcomes. For Asian 
students, while each of the point estimates is positive, they are only significant for the first-stage 
and ninth-grade instructional-time effects. This suggests that, while the ES course is on average 
not harmful for any of the enrolled students, it is particularly good for male students and 
Hispanic students. 

In Table 10, we examine whether there are heterogeneous effects on student GPA by 
subject. Each cell in this table reports the key RD estimate (i.e., the estimated “jump” at the 2.0 
threshold) from a unique regression. The first column presents point estimates conditional on 
linear splines of the assignment variable and on school-by-year fixed effects. The subsequent 
models introduce student and eighth-grade covariates. The point estimates show that there are 
consistently positive, statistically significant effects on GPA specific to math and to science, 
despite the distal nature of their respective content to that of ES. In ELA however, while the 
point estimates remain positive, they are smaller and statistically insignificant. 

The literature on causal inference has also recently emphasized another possible (and 
subtler) form of treatment heterogeneity based on the potential-outcomes framework and how 
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individuals respond to their intent-to-treat (i.e., as “compliers”, “always takers”, and “never 
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takers”). Specifically, Imbens and Angrist (1994) show that, when treatment effects are not 
homogenous across these groups and assuming monotonicity, estimates like ours are “local 
average treatment effects” (LATE). Such LATE estimates correspond to the effect of the 
treatment for those who comply with their intent-to-treat, but will likely differ from that for those 
who always (or never) take up the treatment regardless of the intent-to-treat. A recent study by 
Bertanha and Imbens (2014) provides straightforward guidance on assessing the empirical 
relevance of this possible treatment heterogeneity in “fuzzy” RD applications like ours. 
Specifically, they recommend estimating the reduced-form RD specifications for separate 
samples defined by whether the student took up the treatment (i.e., ES € {0,1}). We report these 
results in Table 11 using our saturated model (i.e., column 3 in Table 4). In the first row, we 
repeat our full-sample results as a point of reference. In the second row, we show the estimated 
“jump” in outcomes using only data from students who did not take ES (i.e., ES = 0). In this sub- 
sample, the threshold separates never-takers (i.e., to the left of the threshold) from the population 
of never-takers and compliers who are to the right of the threshold. The fact that outcomes are 
higher to the left of the threshold (i.e., for at least two of the three outcomes) indicates that 
never-takers have unobserved traits that predispose them to better student outcomes relative to 
compliers. Intuitively, this finding suggests that students who insist on taking a health or college 
preparation/study skills course in lieu of ES have unobserved traits that imply better academic 
outcomes. 

The next row identifies the jump at the threshold for each outcome measure using only 
data on students who took ES (i.e., ES = 1). The population to the left of the threshold consists of 
compliers and always-takers while the population to the right only contains always-takers. Our 


evidence that each student outcome jumps significantly at the threshold could indicate that taking 
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the course is more effective for those who only take it when assigned relative to those who insist 
on taking it. This could occur, for example, if the ES course is less novel and relevant for the 
types of students who insist on taking it. Overall, these findings are consistent with the type of 
heterogeneity implied by the LATE theorem. As a practical matter, this evidence of treatment 
heterogeneity has salience for the external validity we might expect when scaling up access to 
this course. In particular, these findings suggest that taking the course is less necessary for the 
type of student who refuses to take the course (i.e., never-takers) and less effective for students 
who insist on taking it when available (i.e., always-takers). We revisit issues of scalability in our 
concluding remarks. 
Discussion 

The results presented in this study indicate that the ninth-grade ES curriculum 
implemented in SFUSD led to large and statistically significant improvements in ninth-grade 
GPA, attendance, and credits earned for compliers at the 2.0 eighth-grade GPA threshold. To our 
knowledge, this is the first study to examine the effect of any type of culturally relevant 
pedagogy (CRP) in a quantitative study that supports credible causal inferences. Specifically, our 
“regression discontinuity” (RD) design leveraged a class-assignment rule that encouraged 
academically at-risk students (i.e., those with eighth-grade GPA below 2.0) to take the course. 
We present several forms of evidence that affirm the validity of this discontinuous assignment 
rule as a quasi-experiment, as well as evidence on the robustness of our main findings. We note 
evidence that these large effects appear to reflect both reductions in the probability of dropping 
out as well as improvements in the performance of enrolled students. We also find that the 


effects of this course were concentrated among males, Hispanics, and to a lesser degree, Asians. 
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Taken at face value, these findings provide a compelling confirmation of an extensive 
literature that has emphasized the capacity of CRP to unlock the educational potential of 
historically marginalized students. We also stress that our results are consistent with other 
theoretical frames as well. In particular, a field-experimental literature in social psychology has 
shown that quite modest interventions that buffer students against stereotype threat can, under 
the right circumstances, dramatically improve student outcomes. ES courses combine several of 
the active ingredients of these interventions (e.g., affirmation, external attribution for difficulties, 
forewarning about stereotypes) and expose students to them in an exceptionally intense and 
persistent manner (i.e., through a year-long course rather than a brief exercise). Furthermore, 
SFUSD’s ES course was also targeted in a manner consistent with such “buffering” interventions 
(i.e., at the beginning of the school year and during a possibly difficult transition to high school). 
Further research that can measure alternative mediators can provide insight into the relevance of 
different theorized mechanisms. 

As a matter of policy and practice, this study’s findings should be interpreted in light of 
several important caveats related to external validity and scalability. First, as in all RD studies, 
our results focus on localized comparisons between students who are just above and below the 
eligibility threshold for ES enrollment. It is, thus, an open question whether the effects of this or 
any other ES curriculum would generalize to higher-performing students. Furthermore, we also 
find evidence that the benefits of taking such a course are larger among those who comply with 
the encouragement to take the course (i.e., relative to students who would always take it when 
available). 

The effects of ES identified by this study are large, and there are several reasons why 


such large effect sizes are credible. First, the effects we estimate are specific to at-risk students 
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who have indications of low performance in eighth grade (i.e., those close to the GPA threshold). 
These are students for whom large ninth-grade gains may be feasible following a renewed 
engagement with school. Second, our sample includes students regardless of whether they 
dropped out during the course of their ninth-grade year. This includes the 7.3 percent of students 
who obtained fewer credits than needed to qualify for promotion to tenth-grade by the end of the 
school year. The improvements we observe in attendance, GPA, and credits earned are likely to 
reflect, in part, the gains among students who otherwise would have dropped out and had very 
low values for these measures. Third, the interpretation of effect sizes in a fuzzy RD setting 
relies in part on understanding the counterfactual, which is not straightforward. Roughly 15 
percent of students above the threshold took up ES. If these students also benefitted from the 
course, the relevant counterfactual outcomes for students below the threshold who took the 
course would be even lower. Fourth, our estimates can be understood as “local average treatment 
effects” (LATE). The LATE interpretation implies that our estimates identify the effects of 
taking ES for “compliers” (i.e., those who take ES when their eighth-grade GPA is below 2.0 and 
do not when it is above). Our investigations into this treatment effect heterogeneity suggest that 
effects are larger for this group than among those who always or never take the course. Finally, if 
our measure for taking ES (i.e., based on administrative data for the fall semester) is 
characterized by measurement error, it would imply a bias towards zero that is corrected by the 
IV procedure. 

In spite of the robustness of the findings in the present study, there are several reasons to 
be cautious about the likely effect of scaling up or replicating this ES course. The 
implementation of ES in SFUSD was, arguably, conducted with a high degree of fidelity, 


forethought, and planning. In particular, it appeared to draw upon the work of a core group of 
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dedicated teachers, engaging in a regular professional learning community, with outside support 
from experts in the subject to create and sustain the program. Scholars from a number of 
disciplines have noted that the effects of such smaller-scale interventions are often very different 
when the same policies are implemented at scale (Dodge, 2011; Welsh, Sullivan, & Olds, 2010). 
The broader school, district, and community contexts in which this course was situated may also 
be relevant. For example, the literature on stereotype threat stresses that the success of buffering 
interventions depends critically on settings that can enhance and encourage positive “recursive” 
processes related to student engagement and success (Yeager and Walton 2011). Nonetheless, 
SFUSD’s ES program appears to constitute an important proof of concept, indicating that 
culturally relevant pedagogy can be extraordinarily effective in supporting the academic 


progression of struggling students. 
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' See http://nces.ed.gov/nationsreportcard/naepdata/ for data on the main NAEP scale scores and standard deviations. 
Bloom, Hill, Black, and Lipsey (2008) provide guidance on interpreting effect sizes as years of learning. 

In fact, some partially attribute the development of CRP to the academic discipline of ethnic studies (Yosso, 
Parker, Solorzano, & Lynn, 2004). 

Ty our main results, we define GPA and credits earned excluding the ES course and all other social studies courses 
(and physical education) to avoid possible confounds related to differences in assessment norms across different 
courses. We also show results specific to mathematics, science, and English/Language Arts courses. 

" This term is similar to terms such as “culturally appropriate,” “culturally congruent,” “culturally compatible,” and 
“culturally responsive” teaching or pedagogy (Gay, 2010; Ladson-Billings, 1992b; 1995; Sleeter, 2011). We use 
Ladson-Billings’ term, culturally relevant pedagogy, throughout because the ES course in SFUSD reflects her 
conception of pedagogical practice that both, “addresses student achievement and also helps students to accept 
affirm their cultural identify while developing critical perspectives that challenge inequalities that schools (and other 
institutions) perpetuate,” (Ladson-Billings, 1995, p. 469). 

’ This appears to be true of CRP, more generally. One possible exception is a recent randomized trial by Kisker et al. 
(2012), which found that a culturally relevant math curriculum significantly improved the performance of second- 
grade Alaskan Natives. However, these gains may conflate the effects of general instructional quality as well as 
cultural relevance. The intervention included teacher training and also improved the performance of students who 
were not Alaskan Natives. 

“i As noted by Cabrera and colleagues (2014) the development of this program was technically unrelated to AB 2281 
and was instead a solution to a 40-year-old desegregation order for TUSD. 

“il This primary analytic sample is restricted to students who are in schools that offered the MAS curriculum. A 
secondary set of analyses includes nearly 17,000 students in all TUSD schools, including those without MAS 
programs. 

“ll Cohort-specific results suggest that this association was not significant for all tests in all years, particularly in the 
final 2011-2012 cohort. The authors speculate that the political turmoil surrounding the program in this year might 
have weakened its effectiveness, or that the expansion of MAS offerings to additional schools might also have 
contributed to the lack of significant results. 

* We exclude the few students with attendance rates below the threshold from our analysis. This implies that we are 
estimating a “frontier” RD (Reardon and Robinson 2012). 

* The EWI system was created through a partnership between SFUSD and the John Gardner Center at Stanford 
University. This partnership identified two key predictors of students’ likelihood of dropping out of high school, an 
eighth-grade GPA below 2.0 and an attendance rate below 87.5 percent. 

*| Similarly, students who had not been identified using the EWI system could opt in to the course after consulting 
with their counselors if they desired to enroll, but were not automatically assigned to the course. 

xl Specifically, we examine the effectiveness of these teachers relative to their peers in other courses (i.e., other than 
ethnic studies). 

xii Based on the limited data available, we suspect some of the students with very low eighth grade GPA have unique 
special-education circumstances or missing data. 

*” SFUSD’s Asian-American population is also quite diverse in terms to country of origin and includes students of 
Chinese, Filipino, Vietnamese, and Indian heritage, among others. Results run separately by country of origin are 
qualitatively consistent with the main findings, particularly for Chinese and Filipino students, although the sample 
sizes are small enough that not all of the estimates remain statistically significant. 

“ We define treatment uptake as being enrolled in the first-semester ES course, regardless of whether a student 
remained in the course. Among students in our sample, all attained final grades for the first semester of the course 
and only 7 percent did not obtain final grades for the second semester. 

vi We are unable to examine effects on standardized tests because our cohorts overlap the period in which California 
switched from the California Standards Test (CST) to the Smarter Balanced Assessments System, including a 
transition year in which no test was administered. 

“li Ty our analysis sample, only one student has missing data in the spring of ninth grade (i.e., for the GPA and 
credits earned measures but not the attendance rate). 
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vill Robust standard errors correct for any heteroscadasticity following recommendations by Angrist and Pischke 
(2010) who argue that robust standard errors are preferred when you estimate a linear model on a data-generating 
process that is non-linear (i.e., even in the presence of classically homoscedastic errors). Models estimated with 
classical standard errors yield identical conclusions. 

*« Interested readers should consult pages 235-238 for more details on two-stage least squares (2SLS) IV estimates. 
*« Ninth-grade students in SFUSD typically take a ninth-grade English course, either Algebra 1 or Geometry, and 
either Biology or Physics. 

*i For example, for ninth-grade GPA, the ranks of the four ethnic-studies teachers were 9, 14, 30, and 35 among 38 
total social studies teachers. The standard deviations of these fixed effects estimates ranged from 0.22 SD (GPA) to 
0.54 SD (credits earned), which, while somewhat larger than the standard deviations of teacher value-added 
measures to test scores, are in line with estimates of value-added to non-test academic outcomes (e.g., Gershenson, 
2016). 

xii The in-progress scale-up of this course across SFUSD high schools may provide opportunities to explore this 
heterogeneity. 


